<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>Architecting Transactional Data Store applications</title>
    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
    <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
    <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
    <link rel="up" href="transapp.html" title="Chapter 12.  Berkeley DB Transactional Data Store Applications" />
    <link rel="prev" href="transapp_fail.html" title="Handling failure in Transactional Data Store applications" />
    <link rel="next" href="transapp_env_open.html" title="Opening the environment" />
  </head>
  <body>
    <div xmlns="" class="navheader">
      <div class="libver">
        <p>Library Version 12.1.6.2</p>
      </div>
      <table width="100%" summary="Navigation header">
        <tr>
          <th colspan="3" align="center">Architecting Transactional Data
        Store applications</th>
        </tr>
        <tr>
          <td width="20%" align="left"><a accesskey="p" href="transapp_fail.html">Prev</a> </td>
          <th width="60%" align="center">Chapter 12.  Berkeley DB Transactional Data Store Applications </th>
          <td width="20%" align="right"> <a accesskey="n" href="transapp_env_open.html">Next</a></td>
        </tr>
      </table>
      <hr />
    </div>
    <div class="sect1" lang="en" xml:lang="en">
      <div class="titlepage">
        <div>
          <div>
            <h2 class="title" style="clear: both"><a id="transapp_app"></a>Architecting Transactional Data
        Store applications</h2>
          </div>
        </div>
      </div>
      <p> 
        When building Transactional Data Store applications, the
        architecture decisions involve application startup (running
        recovery) and handling system or application failure. For
        details on performing recovery, see the <a class="xref" href="transapp_recovery.html" title="Recovery procedures">Recovery procedures</a>. 
    </p>
      <p> 
        Recovery in a database environment is a single-threaded
        procedure, that is, one thread of control or process must
        complete database environment recovery before any other thread
        of control or process operates in the Berkeley DB environment. 
    </p>
      <p> 
        Performing recovery first marks any existing database
        environment as "failed" and then removes it, causing threads
        of control running in the database environment to fail and
        return to the application. This feature allows applications to
        recover environments without concern for threads of control
        that might still be running in the removed environment. The
        subsequent re-creation of the database environment is
        serialized, so multiple threads of control attempting to
        create a database environment will serialize behind a single
        creating thread.
    </p>
      <p> 
        One consideration in removing (as part of recovering) a
        database environment which may be in use by another thread, is
        the type of mutex being used by the Berkeley DB library. In
        the case of database environment failure when using
        test-and-set mutexes, threads of control waiting on a mutex
        when the environment is marked "failed" will quickly notice
        the failure and will return an error from the Berkeley DB API.
        In the case of environment failure when using blocking
        mutexes, where the underlying system mutex implementation does
        not unblock mutex waiters after the thread of control holding
        the mutex dies, threads waiting on a mutex when an environment
        is recovered might hang forever. Applications blocked on
        events (for example, an application blocked on a network
        socket, or a GUI event) may also fail to notice environment
        recovery within a reasonable amount of time. Systems with such
        mutex implementations are rare, but do exist; applications on
        such systems should use an application architecture where the
        thread recovering the database environment can explicitly
        terminate any process using the failed environment, or
        configure Berkeley DB for test-and-set mutexes, or incorporate
        some form of long-running timer or watchdog process to wake or
        kill blocked processes should they block for too long. 
    </p>
      <p>
        Regardless, it makes little sense for multiple threads of
        control to simultaneously attempt recovery of a database
        environment, since the last one to run will remove all
        database environments created by the threads of control that
        ran before it. However, for some applications, it may make
        sense for applications to have a single thread of control that
        performs recovery and then removes the database environment,
        after which the application launches a number of processes,
        any of which will create the database environment and continue
        forward.
    </p>
      <p>
        There are four ways to architect Berkeley DB
        Transactional Data Store applications. The one chosen is
        usually based on whether or not the application is comprised
        of a single process or group of processes descended from a
        single process (for example, a server started when the system
        first boots), or if the application is comprised of unrelated
        processes (for example, processes started by web connections
        or users logged into the system).
    </p>
      <div class="orderedlist">
        <ol type="1">
          <li>
            <p>
                The first way to architect Transactional Data Store
                applications is as a single process (the process may
                or may not be multithreaded.) 
            </p>
            <p> 
                When this process starts, it runs recovery on the
                database environment and then opens its databases. The
                application can subsequently create new threads as it
                chooses. Those threads can either share already open
                Berkeley DB <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> and <a href="../api_reference/C/db.html" class="olink">DB</a> handles, or create their
                own. In this architecture, databases are rarely opened
                or closed when more than a single thread of control is
                running; that is, they are opened when only a single
                thread is running, and closed after all threads but
                one have exited. The last thread of control to exit
                closes the databases and the database environment.
            </p>
            <p> 
                This architecture is simplest to implement because
                thread serialization is easy and failure detection
                does not require monitoring multiple processes. 
            </p>
            <p>
                If the application's thread model allows processes
                to continue after thread failure, the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a>
                method can be used to determine if the database
                environment is usable after thread failure. If the
                application does not call <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a>, or
                <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> returns <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>,
                the application must
                behave as if there has been a system failure,
                performing recovery and re-creating the database
                environment. Once these actions have been taken, other
                threads of control can continue (as long as all
                existing Berkeley DB handles are first discarded).
            </p>
            <p>
                Note that by default <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> will only notify the
                calling thread that the database environment is unusable.
                However, you can optionally cause <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> to broadcast
                this to other threads of control by using the
                <code class="literal">--enable-failchk_broadcast</code> flag when you
                compile your Berkeley DB library. If this option is turned
                on, then all threads of control using the database
                environment will return
                <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>
                when they attempt to obtain a mutex lock. In this
                situation, a <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_FAILCHK_PANIC" class="olink">DB_EVENT_FAILCHK_PANIC</a> or
                <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_MUTEX_DIED" class="olink">DB_EVENT_MUTEX_DIED</a> event will also be
                raised.  (You use <a href="../api_reference/C/envevent_notify.html" class="olink">DB_ENV-&gt;set_event_notify()</a> to examine events).
            </p>
          </li>
          <li>
            <p> 
                The second way to architect Transactional Data
                Store applications is as a group of related processes
                (the processes may or may not be multithreaded). 
            </p>
            <p>
                This architecture requires the order in which
                threads of control are created be controlled to
                serialize database environment recovery.
            </p>
            <p>
                In addition, this architecture requires that
                threads of control be monitored. If any thread of
                control exits with open Berkeley DB handles, the
                application may call the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> method to detect
                lost mutexes and locks and determine if the
                application can continue. If the application does not
                call <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a>, or <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> returns that the
                database environment can no longer be used, the
                application must behave as if there has been a system
                failure, performing recovery and creating a new
                database environment. Once these actions have been
                taken, other threads of control can be continued (as
                long as all existing Berkeley DB handles are first
                discarded).
            </p>
            <p> 
                The easiest way to structure groups of related
                processes is to first create a single "watcher"
                process (often a script) that starts when the system
                first boots, runs recovery on the database environment
                and then creates the processes or threads that will
                actually perform work. The initial thread has no
                further responsibilities other than to wait on the
                threads of control it has started, to ensure none of
                them unexpectedly exit. If a thread of control exits,
                the watcher process optionally calls the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a>
                method. If the application does not call <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a>
                or if <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> returns that the environment can no
                longer be used, the watcher kills all of the threads
                of control using the failed environment, runs
                recovery, and starts new threads of control to perform
                work. 
            </p>
          </li>
          <li>
            <p>
                The third way to architect Transactional Data Store
                applications is as a group of related processes that rely
                on <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> broadcasting to inform other threads and
                processes that recovery is required. <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a>
                broadcasting is not enabled by default for the DB
                library, but using broadcasting means that a watcher
                process is not required. Instead, if <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> fails
                then all other threads and processes operating in that
                environment will also be notified of that failure so that
                they can know to run recovery.
            </p>
            <p>
                To enable <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> broadcasting use the
                <code class="literal">--enable-failchk_broadcast</code> flag when you
                configure the library. On Windows, use
                <code class="literal">HAVE_FAILCHK_BROADCAST</code> when you compile
                the library.
            </p>
            <p>
                If <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> broadcasting is enabled for your library
                and a thread of control encounters a failure when
                <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> is run, then all other threads and processes
                operating in that environment will be notified.  If a
                failure is broadcast, then threads and processes will
                receive 
                <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>
                when they attempt to preform any one of a range of
                activities, including:
            </p>
            <div class="itemizedlist">
              <ul type="disc">
                <li>
                  <p>
                        When entering a DB API.
                    </p>
                </li>
                <li>
                  <p>
                        When locking a mutex.
                    </p>
                </li>
                <li>
                  <p>
                        When performing disk or network I/O.
                    </p>
                </li>
              </ul>
            </div>
            <p>
                Threads and processes that are
                monitoring events will also receive
                <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_FAILCHK_PANIC" class="olink">DB_EVENT_FAILCHK_PANIC</a> or
                <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_MUTEX_DIED" class="olink">DB_EVENT_MUTEX_DIED</a>. You use
                <a href="../api_reference/C/envevent_notify.html" class="olink">DB_ENV-&gt;set_event_notify()</a> to examine events.
            </p>
          </li>
          <li>
            <p> 
                The fourth way to architect Transactional Data Store
                applications is as a group of unrelated processes (the
                processes may or may not be multithreaded). This is
                the most difficult architecture to implement because
                of the level of difficulty in some systems of finding
                and monitoring unrelated processes. There are several
                possible techniques to implement this architecture.
            </p>
            <p>
                One solution is to log a thread of control ID when
                a new Berkeley DB handle is opened. For example, an
                initial "watcher" process could run recovery on the
                database environment and then create a sentinel file.
                Any "worker" process wanting to use the environment
                would check for the sentinel file. If the sentinel
                file does not exist, the worker would fail or wait for
                the sentinel file to be created. Once the sentinel
                file exists, the worker would register its process ID
                with the watcher (via shared memory, IPC or some other
                registry mechanism), and then the worker would open
                its <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> handles and proceed. When the worker
                finishes using the environment, it would unregister
                its process ID with the watcher. The watcher
                periodically checks to ensure that no worker has
                failed while using the environment. If a worker fails
                while using the environment, the watcher removes the
                sentinel file, kills all of the workers currently
                using the environment, runs recovery on the
                environment, and finally creates a new sentinel file.
            </p>
            <p>
                The weakness of this approach is that, on some
                systems, it is difficult to determine if an unrelated
                process is still running. For example, POSIX systems
                generally disallow sending signals to unrelated
                processes. The trick to monitoring unrelated processes
                is to find a system resource held by the process that
                will be modified if the process dies. On POSIX
                systems, flock- or fcntl-style locking will work, as
                will LockFile on Windows systems. Other systems may
                have to use other process-related information such as
                file reference counts or modification times. In the
                worst case, threads of control can be required to
                periodically re-register with the watcher process: if
                the watcher has not heard from a thread of control in
                a specified period of time, the watcher will take
                action, recovering the environment.
            </p>
            <p> 
                The Berkeley DB library includes one built-in
                implementation of this approach, the <a href="../api_reference/C/envopen.html" class="olink">DB_ENV-&gt;open()</a>
                method's <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag: 
            </p>
            <p>
                If the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag is set, each process
                opening the database environment first checks to see
                if recovery needs to be performed. If recovery needs
                to be performed for any reason (including the initial
                creation of the database environment), and
                <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is also specified, recovery will be
                performed and then the open will proceed normally. If
                recovery needs to be performed and <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is not
                specified, <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>
                will be returned. If recovery does not need to be performed, <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a>
                will be ignored. 
            </p>
            <p>
                Prior to the actual recovery beginning, the
                <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_REG_PANIC" class="olink">DB_EVENT_REG_PANIC</a> event is set for the environment.
                Processes in the application using the
                <a href="../api_reference/C/envevent_notify.html" class="olink">DB_ENV-&gt;set_event_notify()</a> method will be notified when they do
                their next operations in the environment. Processes
                receiving this event should exit the environment.
                Also, the <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_REG_ALIVE" class="olink">DB_EVENT_REG_ALIVE</a> event will be triggered
                if there are other processes currently attached to the
                environment. Only the process doing the recovery will
                receive this event notification. It will receive this
                notification once for each process still attached to
                the environment. The parameter of the
                <a href="../api_reference/C/envevent_notify.html" class="olink">DB_ENV-&gt;set_event_notify()</a> callback will contain the process
                identifier of the process still attached. The process
                doing the recovery can then signal the attached
                process or perform some other operation prior to
                recovery (i.e. kill the attached process). 
            </p>
            <p>
                The <a href="../api_reference/C/envset_timeout.html" class="olink">DB_ENV-&gt;set_timeout()</a> method's <a href="../api_reference/C/envset_timeout.html#set_timeout_DB_SET_REG_TIMEOUT" class="olink">DB_SET_REG_TIMEOUT</a>
                flag can be set to establish a wait period before
                starting recovery. This creates a window of time for
                other processes to receive the DB_EVENT_REG_PANIC
                event and exit the environment. 
            </p>
            <p>
                There are three additional requirements for the
                <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> architecture to work: 
            </p>
            <div class="itemizedlist">
              <ul type="disc">
                <li>
                  <p>
                        First, all applications using the database
                        environment must specify the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a>
                        flag when opening the environment. However,
                        there is no additional requirement if the
                        application chooses a single process to
                        recover the environment, as the first process
                        to open the database environment will know to
                        perform recovery.
                    </p>
                </li>
                <li>
                  <p> 
                        Second, there can only be a single <a href="../api_reference/C/env.html" class="olink">DB_ENV</a>
                        handle per database environment in each
                        process. As the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> locking is
                        per-process, not per-thread, multiple <a href="../api_reference/C/env.html" class="olink">DB_ENV</a>
                        handles in a single environment could race
                        with each other, potentially causing data
                        corruption. 
                    </p>
                </li>
                <li>
                  <p>
                        Third, the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> implementation
                        does not explicitly terminate processes using
                        the database environment which is being
                        recovered. Instead, it relies on the processes
                        themselves noticing the database environment
                        has been discarded from underneath them. For
                        this reason, the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag should be
                        used with a mutex implementation that does not
                        block in the operating system, as that risks a
                        thread of control blocking forever on a mutex
                        which will never be granted. Using any
                        test-and-set mutex implementation ensures this
                        cannot happen, and for that reason the
                        <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag is generally used with a
                        test-and-set mutex implementation.
                    </p>
                </li>
              </ul>
            </div>
            <p> 
                A second solution for groups of unrelated processes
                is also based on a "watcher process". This solution is
                intended for systems where it is not practical to
                monitor the processes sharing a database environment,
                but it is possible to monitor the environment to
                detect if a thread of control has failed holding open
                Berkeley DB handles. This would be done by having a
                "watcher" process periodically call the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a>
                method. If <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> returns that the environment
                can no longer be used, the watcher would then take
                action, recovering the environment. 
            </p>
            <p> 
                The weakness of this approach is that all threads
                of control using the environment must specify an "ID"
                function and an "is-alive" function using the
                <a href="../api_reference/C/envset_thread_id.html" class="olink">DB_ENV-&gt;set_thread_id()</a> method. (In other words, the
                Berkeley DB library must be able to assign a unique ID
                to each thread of control, and additionally determine
                if the thread of control is still running. It can be
                difficult to portably provide that information in
                applications using a variety of different programming
                languages and running on a variety of different
                platforms.) 
            </p>
            <p> 
                A third solution for groups of unrelated processes
                is a hybrid of the two above. Along with implementing
                the built-in sentinel approach with the the <a href="../api_reference/C/envopen.html" class="olink">DB_ENV-&gt;open()</a>
                methods <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag, the <a href="../api_reference/C/envopen.html#envopen_DB_FAILCHK" class="olink">DB_FAILCHK</a> flag can
                be specified. When using both flags, each process
                opening the database environment first checks to see
                if recovery needs to be performed. If recovery needs
                to be performed for any reason, it will first
                determine if a thread of control exited while holding
                database read locks, and release those. Then it will
                abort any unresolved transactions. If these steps are
                successful, the process opening the environment will
                continue without the need for any additional recovery.
                If these steps are unsuccessful, then additional
                recovery will be performed if <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is
                specified and if <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is not specified, <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>
                will be returned. 
            </p>
            <p> 
                Since this solution is hybrid of the first two, all
                of the requirements of both of them must be
                implemented (will need "ID" function, "is-alive"
                function, single <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> handle per database, etc.)
            </p>
            <p> 
                The described approaches are different, and should
                not be combined. Applications might use either the
                <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> approach, the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> or the hybrid
                approach, but not together in the same application.
                For example, a POSIX application written as a library
                underneath a wide variety of interfaces and differing
                APIs might choose the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> approach for a few
                reasons: first, it does not require making periodic
                calls to the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> method; second, when
                implementing in a variety of languages, is may be more
                difficult to specify unique IDs for each thread of
                control; third, it may be more difficult determine if
                a thread of control is still running, as any
                particular thread of control is likely to lack
                sufficient permissions to signal other processes.
                Alternatively, an application with a dedicated watcher
                process, running with appropriate permissions, might
                choose the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> approach as supporting higher
                overall throughput and reliability, as that approach
                allows the application to abort unresolved
                transactions and continue forward without having to
                recover the database environment. The hybrid approach
                is useful in situations where running a dedicated
                watcher process is not practical but getting the
                equivalent of <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV-&gt;failchk()</a> on the <a href="../api_reference/C/envopen.html" class="olink">DB_ENV-&gt;open()</a> is
                important.
            </p>
          </li>
        </ol>
      </div>
      <p>
        Obviously, when implementing a process to monitor other
        threads of control, it is important the watcher process' code
        be as simple and well-tested as possible, because the
        application may hang if it fails.
    </p>
    </div>
    <div class="navfooter">
      <hr />
      <table width="100%" summary="Navigation footer">
        <tr>
          <td width="40%" align="left"><a accesskey="p" href="transapp_fail.html">Prev</a> </td>
          <td width="20%" align="center">
            <a accesskey="u" href="transapp.html">Up</a>
          </td>
          <td width="40%" align="right"> <a accesskey="n" href="transapp_env_open.html">Next</a></td>
        </tr>
        <tr>
          <td width="40%" align="left" valign="top">Handling failure in Transactional Data Store applications </td>
          <td width="20%" align="center">
            <a accesskey="h" href="index.html">Home</a>
          </td>
          <td width="40%" align="right" valign="top"> Opening the environment</td>
        </tr>
      </table>
    </div>
  </body>
</html>
